Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bull Math Biol ; 85(7): 68, 2023 06 13.
Artigo em Inglês | MEDLINE | ID: mdl-37310552

RESUMO

Homogeneity across lineages is a general assumption in phylogenetics according to which nucleotide substitution rates are common to all lineages. Many phylogenetic methods relax this hypothesis but keep a simple enough model to make the process of sequence evolution more tractable. On the other hand, dealing successfully with the general case (heterogeneity of rates across lineages) is one of the key features of phylogenetic reconstruction methods based on algebraic tools. The goal of this paper is twofold. First, we present a new weighting system for quartets (ASAQ) based on algebraic and semi-algebraic tools, thus especially indicated to deal with data evolving under heterogeneous rates. This method combines the weights of two previous methods by means of a test based on the positivity of the branch lengths estimated with the paralinear distance. ASAQ is statistically consistent when applied to data generated under the general Markov model, considers rate and base composition heterogeneity among lineages and does not assume stationarity nor time-reversibility. Second, we test and compare the performance of several quartet-based methods for phylogenetic tree reconstruction (namely QFM, wQFM, quartet puzzling, weight optimization and Willson's method) in combination with several systems of weights, including ASAQ weights and other weights based on algebraic and semi-algebraic methods or on the paralinear distance. These tests are applied to both simulated and real data and support weight optimization with ASAQ weights as a reliable and successful reconstruction method that improves upon the accuracy of global methods (such as neighbor-joining or maximum likelihood) in the presence of long branches or on mixtures of distributions on trees.


Assuntos
Conceitos Matemáticos , Modelos Biológicos , Filogenia , Nucleotídeos
2.
IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2855-2861, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34339375

RESUMO

We present the phylogenetic quartet reconstruction method SAQ (Semi-Algebraic Quartet reconstruction). SAQ is consistent with the most general Markov model of nucleotide substitution and, in particular, it allows for rate heterogeneity across lineages. Based on the algebraic and semi-algebraic description of distributions that arise from the general Markov model on a quartet, the method outputs normalized weights for the three trivalent quartets (which can be used as input of quartet-based methods). We show that SAQ is a highly competitive method that outperforms most of the well known reconstruction methods on data simulated under the general Markov model on 4-taxon trees. Moreover, it also achieves a high performance on data that violates the underlying assumptions.


Assuntos
Biologia Computacional/métodos , Cadeias de Markov , Modelos Genéticos , Filogenia , Algoritmos
3.
J Math Biol ; 80(4): 995-1019, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31705189

RESUMO

Deciding whether a substitution matrix is embeddable (i.e. the corresponding Markov process has a continuous-time realization) is an open problem even for [Formula: see text] matrices. We study the embedding problem and rate identifiability for the K80 model of nucleotide substitution. For these [Formula: see text] matrices, we fully characterize the set of embeddable K80 Markov matrices and the set of embeddable matrices for which rates are identifiable. In particular, we describe an open subset of embeddable matrices with non-identifiable rates. This set contains matrices with positive eigenvalues and also diagonal largest in column matrices, which might lead to consequences in parameter estimation in phylogenetics. Finally, we compute the relative volumes of embeddable K80 matrices and of embeddable matrices with identifiable rates. This study concludes the embedding problem for the more general model K81 and its submodels, which had been initiated by the last two authors in a separate work.


Assuntos
Modelos Genéticos , Taxa de Mutação , Filogenia , Evolução Molecular , Cadeias de Markov , Conceitos Matemáticos , Mutação , Nucleotídeos/genética
5.
J Math Biol ; 74(5): 1107-1138, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-27604275

RESUMO

The reconstruction of phylogenetic trees from molecular sequence data relies on modelling site substitutions by a Markov process, or a mixture of such processes. In general, allowing mixed processes can result in different tree topologies becoming indistinguishable from the data, even for infinitely long sequences. However, when the underlying Markov process supports linear phylogenetic invariants, then provided these are sufficiently informative, the identifiability of the tree topology can be restored. In this paper, we investigate a class of processes that support linear invariants once the stationary distribution is fixed, the 'equal input model'. This model generalizes the 'Felsenstein 1981' model (and thereby the Jukes-Cantor model) from four states to an arbitrary number of states (finite or infinite), and it can also be described by a 'random cluster' process. We describe the structure and dimension of the vector spaces of phylogenetic mixtures and of linear invariants for any fixed phylogenetic tree (and for all trees-the so called 'model invariants'), on any number n of leaves. We also provide a precise description of the space of mixtures and linear invariants for the special case of [Formula: see text] leaves. By combining techniques from discrete random processes and (multi-) linear algebra, our results build on a classic result that was first established by James Lake (Mol Biol Evol 4:167-191, 1987).


Assuntos
Modelos Biológicos , Filogenia , Cadeias de Markov , Dados de Sequência Molecular
6.
Syst Biol ; 65(2): 280-91, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26559009

RESUMO

One reason why classical phylogenetic reconstruction methods fail to correctly infer the underlying topology is because they assume oversimplified models. In this article, we propose a quartet reconstruction method consistent with the most general Markov model of nucleotide substitution, which can also deal with data coming from mixtures on the same topology. Our proposed method uses phylogenetic invariants and provides a system of weights that can be used as input for quartet-based methods. We study its performance on real data and on a wide range of simulated 4-taxon data (both time-homogeneous and nonhomogeneous, with or without among-site rate heterogeneity, and with different branch length settings). We compare it to the classical methods of neighbor-joining (with paralinear distance), maximum likelihood (with different underlying models), and maximum parsimony. Our results show that this method is accurate and robust, has a similar performance to maximum likelihood when data satisfies the assumptions of both methods, and outperform the other methods when these are based on inappropriate substitution models. If alignments are long enough, then it also outperforms other methods when some of its assumptions are violated.


Assuntos
Classificação/métodos , Modelos Biológicos , Filogenia , Evolução Biológica , Candida albicans/classificação , Candida albicans/genética , Simulação por Computador , Saccharomyces/classificação , Saccharomyces/genética
7.
BMC Evol Biol ; 14: 132, 2014 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-24938507

RESUMO

BACKGROUND: The reconstruction of the phylogenetic tree topology of four taxa is, still nowadays, one of the main challenges in phylogenetics. Its difficulties lie in considering not too restrictive evolutionary models, and correctly dealing with the long-branch attraction problem. The correct reconstruction of 4-taxon trees is crucial for making quartet-based methods work and being able to recover large phylogenies. METHODS: We adapt the well known expectation-maximization algorithm to evolutionary Markov models on phylogenetic 4-taxon trees. We then use this algorithm to estimate the substitution parameters, compute the corresponding likelihood, and to infer the most likely quartet. RESULTS: In this paper we consider an expectation-maximization method for maximizing the likelihood of (time nonhomogeneous) evolutionary Markov models on trees. We study its success on reconstructing 4-taxon topologies and its performance as input method in quartet-based phylogenetic reconstruction methods such as QFIT and QuartetSuite. Our results show that the method proposed here outperforms neighbor-joining and the usual (time-homogeneous continuous-time) maximum likelihood methods on 4-leaved trees with among-lineage instantaneous rate heterogeneity, and perform similarly to usual continuous-time maximum-likelihood when data satisfies the assumptions of both methods. CONCLUSIONS: The method presented in this paper is well suited for reconstructing the topology of any number of taxa via quartet-based methods and is highly accurate, specially regarding largely divergent trees and time nonhomogeneous data.


Assuntos
Algoritmos , Classificação/métodos , Modelos Genéticos , Filogenia , Evolução Biológica , Funções Verossimilhança , Cadeias de Markov
8.
Algorithms Mol Biol ; 7(1): 33, 2012 Nov 28.
Artigo em Inglês | MEDLINE | ID: mdl-23190710

RESUMO

BACKGROUND: The selection of an evolutionary model to best fit given molecular data is usually a heuristic choice. In his seminal book, J. Felsenstein suggested that certain linear equations satisfied by the expected probabilities of patterns observed at the leaves of a phylogenetic tree could be used for model selection. It remained an open question, however, whether these equations were sufficient to fully characterize the evolutionary model under consideration. RESULTS: Here we prove that, for most equivariant models of evolution, the space of distributions satisfying these linear equations coincides with the space of distributions arising from mixtures of trees. In other words, we prove that the evolution of an observed multiple sequence alignment can be modeled by a mixture of phylogenetic trees under an equivariant evolutionary model if and only if the distribution of patterns at its columns satisfies the linear equations mentioned above. Moreover, we provide a set of linearly independent equations defining this space of phylogenetic mixtures for each equivariant model and for any number of taxa. Lastly, we use these results to perform a study of identifiability of phylogenetic mixtures. CONCLUSIONS: The space of phylogenetic mixtures under equivariant models is a linear space that fully characterizes the evolutionary model. We provide an explicit algorithm to obtain the equations defining these spaces for a number of models and taxa. Its implementation has proved to be a powerful tool for model selection.

9.
BMC Bioinformatics ; 13: 216, 2012 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-22928840

RESUMO

BACKGROUND: A number of software packages are available to generate DNA multiple sequence alignments (MSAs) evolved under continuous-time Markov processes on phylogenetic trees. On the other hand, methods of simulating the DNA MSA directly from the transition matrices do not exist. Moreover, existing software restricts to the time-reversible models and it is not optimized to generate nonhomogeneous data (i.e. placing distinct substitution rates at different lineages). RESULTS: We present the first package designed to generate MSAs evolving under discrete-time Markov processes on phylogenetic trees, directly from probability substitution matrices. Based on the input model and a phylogenetic tree in the Newick format (with branch lengths measured as the expected number of substitutions per site), the algorithm produces DNA alignments of desired length. GenNon-h is publicly available for download. CONCLUSION: The software presented here is an efficient tool to generate DNA MSAs on a given phylogenetic tree. GenNon-h provides the user with the nonstationary or nonhomogeneous phylogenetic data that is well suited for testing complex biological hypotheses, exploring the limits of the reconstruction algorithms and their robustness to such models.


Assuntos
Filogenia , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Evolução Molecular , Cadeias de Markov , Alinhamento de Sequência/estatística & dados numéricos , Análise de Sequência de DNA/estatística & dados numéricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...